Skip to content

Conversation

@uditagarwal97
Copy link
Contributor

Unlike zlib (AFAIK), zstd stores the size of the original, uncompressed buffer in the header and provides the following API to retrieve it:

unsigned long long ZSTD_getFrameContentSize(const void *src, size_t srcSize);

This PR adds a wrapper around this API in llvm::compression::zstd namespace so that the users won't have to keep track of the original, uncompressed size when passing around zstd-compressed buffers.

@github-actions
Copy link

github-actions bot commented Sep 2, 2024

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Sep 2, 2024

@llvm/pr-subscribers-llvm-support

Author: Udit Agarwal (uditagarwal97)

Changes

Unlike zlib (AFAIK), zstd stores the size of the original, uncompressed buffer in the header and provides the following API to retrieve it:

unsigned long long ZSTD_getFrameContentSize(const void *src, size_t srcSize);

This PR adds a wrapper around this API in llvm::compression::zstd namespace so that the users won't have to keep track of the original, uncompressed size when passing around zstd-compressed buffers.


Full diff: https://github.com/llvm/llvm-project/pull/107020.diff

3 Files Affected:

  • (modified) llvm/include/llvm/Support/Compression.h (+2)
  • (modified) llvm/lib/Support/Compression.cpp (+18)
  • (modified) llvm/unittests/Support/CompressionTest.cpp (+8-1)
diff --git a/llvm/include/llvm/Support/Compression.h b/llvm/include/llvm/Support/Compression.h
index 2a8da9e96d356f..da7b6f88d955b0 100644
--- a/llvm/include/llvm/Support/Compression.h
+++ b/llvm/include/llvm/Support/Compression.h
@@ -71,6 +71,8 @@ Error decompress(ArrayRef<uint8_t> Input, uint8_t *Output,
 Error decompress(ArrayRef<uint8_t> Input, SmallVectorImpl<uint8_t> &Output,
                  size_t UncompressedSize);
 
+// Get the size of the decompressed data.
+Error getDecompressedSize(ArrayRef<uint8_t> Input, size_t &UncompressedSize);
 } // End of namespace zstd
 
 enum class Format {
diff --git a/llvm/lib/Support/Compression.cpp b/llvm/lib/Support/Compression.cpp
index badaf68ab59cd0..ab012dfb5612b1 100644
--- a/llvm/lib/Support/Compression.cpp
+++ b/llvm/lib/Support/Compression.cpp
@@ -224,6 +224,20 @@ Error zstd::decompress(ArrayRef<uint8_t> Input,
   return E;
 }
 
+Error zstd::getDecompressedSize(ArrayRef<uint8_t> Input,
+                                size_t &UncompressedSize) {
+
+  unsigned long long Res = ZSTD_getFrameContentSize(Input.data(), Input.size());
+
+  // ZSTD_getFrameContentSize returns unsigned long long, but the size
+  // of uncompressed data should be bounded by size_t.
+  UncompressedSize = static_cast<size_t>(Res);
+
+  return ZSTD_isError(Res) ? make_error<StringError>(ZSTD_getErrorName(Res),
+                                                     inconvertibleErrorCode())
+                           : Error::success();
+}
+
 #else
 bool zstd::isAvailable() { return false; }
 void zstd::compress(ArrayRef<uint8_t> Input,
@@ -240,4 +254,8 @@ Error zstd::decompress(ArrayRef<uint8_t> Input,
                        size_t UncompressedSize) {
   llvm_unreachable("zstd::decompress is unavailable");
 }
+Error zstd::getDecompressedSize(ArrayRef<uint8_t> Input,
+                                size_t &UncompressedSize) {
+  llvm_unreachable("zstd::getDecompressedSize is unavailable");
+}
 #endif
diff --git a/llvm/unittests/Support/CompressionTest.cpp b/llvm/unittests/Support/CompressionTest.cpp
index 5d326cafbe3a1c..058be206d28b1f 100644
--- a/llvm/unittests/Support/CompressionTest.cpp
+++ b/llvm/unittests/Support/CompressionTest.cpp
@@ -73,8 +73,15 @@ static void testZstdCompression(StringRef Input, int Level) {
   SmallVector<uint8_t, 0> Uncompressed;
   zstd::compress(arrayRefFromStringRef(Input), Compressed, Level);
 
+  // Check that getDecompressedSize returns the size of the original buffer.
+  size_t DecompressedSize;
+  Error E = zstd::getDecompressedSize(Compressed, DecompressedSize);
+  EXPECT_FALSE(std::move(E));
+
+  EXPECT_EQ(DecompressedSize, Input.size());
+
   // Check that uncompressed buffer is the same as original.
-  Error E = zstd::decompress(Compressed, Uncompressed, Input.size());
+  E = zstd::decompress(Compressed, Uncompressed, Input.size());
   EXPECT_FALSE(std::move(E));
   EXPECT_EQ(Input, toStringRef(Uncompressed));
 

@uditagarwal97
Copy link
Contributor Author

@ckissane @MaskRay FYI

@uditagarwal97
Copy link
Contributor Author

uditagarwal97 commented Sep 8, 2024

Ping

1 similar comment
@uditagarwal97
Copy link
Contributor Author

Ping

@uditagarwal97
Copy link
Contributor Author

Ping @ckissane @MaskRay Can I please get some review on this?

return E;
}

Error zstd::getDecompressedSize(ArrayRef<uint8_t> Input,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more conventional to return Expected<uint64_t> and avoid output parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in c9046d7

@MaskRay
Copy link
Member

MaskRay commented Oct 14, 2024

Do you have an in-tree use case for this API? If not, I don't think we want to add a wrapper here.

@uditagarwal97
Copy link
Contributor Author

uditagarwal97 commented May 19, 2025

Do you have an in-tree use case for this API? If not, I don't think we want to add a wrapper here.

There are efforts going on to upstream SYCL RT to LLVM (https://discourse.llvm.org/t/rfc-sycl-runtime-upstreaming/74479/4). In SYCL offloading, we use zstd to compress device code during compilation and the decompress deice code on demand at runtime.

Without any API to get decompressed size (proposed changes in this PR), the component using ZSTD compression has to always pass on the decompressed size (original blob size) to the component doing decompression - that's what we currently do in clang-offload-bundler. However, this is redundant since ZSTD already stores the original size in compressed blob headers (if not using ZSTD streaming mode - which LLVM anyway doesn't support) and we can use ZSTD_getFrameContentSize to get that.

Currently, in SYCL RT, we had to link ZSTD library along with LLVMSupport, just so that we can get the decompressed size via ZSTD_getFrameContentSize and so, adding wrapper to this API in LLVMSupport would simplify things a lot and would help with upstreaming.
Moreover, after this change, we would be able to simplify compressed offload bundle data structure (https://clang.llvm.org/docs/ClangOffloadBundler.html#compression-and-decompression) and can get rid of the "Uncompressed binary size" field altogether - saving 32 bits. 😱

@uditagarwal97 uditagarwal97 requested a review from MaskRay May 19, 2025 22:30
@MaskRay
Copy link
Member

MaskRay commented May 31, 2025

As an alternative, you might use ZSTD_getFrameContentSize directly within the to-be-upstreamed runtime. lld/ELF/OutputSections.cpp has an example using the advanced APIs as we are not yet comfortable exposing these in LLVMSupport.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants